180 min approx
pivot_*, separate, unite function from the tidyr package in the Tidyverse to reshape data into tidy one.Illustrations from the Openscapes blog Tidy Data for reproducibility, efficiency, and collaboration by Julia Lowndes and Allison Horst
Illustrations from the Openscapes blog Tidy Data for reproducibility, efficiency, and collaboration by Julia Lowndes and Allison Horst
Illustrations from the Openscapes blog Tidy Data for reproducibility, efficiency, and collaboration by Julia Lowndes and Allison Horst
There are three interrelated rules that make a dataset tidy:
Example: tidyverse::billboard dataset.1
tidyr::pivot_longer
Important
tidyr::pivot_longer convert your data in “longer” fromatcols: select which variable should be pivotingnames_to: define the column hosting the cols colnamesvalues_to: define the column hosting the cols valuesWarning
Many possibly uninformative missing information!
tidyr::pivot_longer
Important
tidyr::pivot_longer convert your data in “longer” fromatcols: select which variable should be pivotingnames_to: define the column hosting the cols colnamesvalues_to: define the column hosting the cols valuesvalues_drop_na: decide if rows with missing information in values should be removedvar1:var10: variables lying between var1 on the left and var10 on the right.
starts_with("a"): names that start with “a”.
ends_with("z"): names that end with “z”.
contains("b"): names that contain “b”.
matches("x.y"): names that match regular expression x.y. 2
num_range(x, 1:4): names following the pattern, x1, x2, …, x4.
all_of(vars)/any_of(vars): names stored in the character vector vars. all_of(vars) will error if the variables aren’t present; any_of(var) will match just the variables that exist.
everything(): all variables.
last_col(): furthest column on the right.
where(is.numeric): all variables where is.numeric() returns TRUE.
Tip
!selection: only variables that don’t match selection.
selection1 & selection2: only variables included in both selection1 and selection2.
selection1 | selection2: all variables that match either selection1 or selection2
Tip
In case of multiple variable in each colname, you can pivoting them maintaining the underling structure. This way you can separate them in a furhter second step usign tidyr::separate.
tidyr::pivot_widerImage from Data Carpentry’s R for Social Scientists
tidyr::pivot_wider
solution.R
To create the current lesson, we explored, used, and adapted content from the following resources:
The slides are made using Posit’s Quarto open-source scientific and technical publishing system powered in R by Yihui Xie’s Kintr.
This work by Corrado Lanera, Ileana Baldi, and Dario Gregori is licensed under CC BY 4.0

UBEP’s R training for supervisors